NNI執行的流程

2021 iThome 鐵人賽

DAY 5

AI & Data

新手一起來Azure上玩 NNI (auto-ML的一種)系列第 5 篇

13th鐵人賽 #azure automl nni

Newbie

2021-09-20 01:41:41

2151 瀏覽

分享至

一個Experiment的運行邏輯是：

• Tuner 接收搜索空間，生成configuration。
• 將這些生成的configuration提交到很多訓練平臺上。
• 將各個平台上執行的訓練結果返回給Advisor。
• 繼續生成新的configuration(若需要的話)，進行下回合的訓練。

使用者的使用邏輯是：

• 定義搜索空間，按照格式要求編寫YML或JSON檔。本例採YML。
• 改動原有模型代碼，添加上nni的api。 (只需插入3行 nni開頭的程式碼而已。)
• 定義實驗配置，在config.yml檔中，根據要求，設置好對應的參數要求。

以下的YML部分，大家大略的看過。等到後面章節讓大家安裝、驗證時，會更有感覺。

# Config_detail.yml 範例。將所有超參同時放在一個YML檔案，比較好說明及理解。

# This example shows more configurable fields comparing to the minimal "config.yml"
# You can use "nnictl create --config config_detailed.yml" to launch this experiment.
# If you see an error message saying "port 8080 is used", 
# use "nnictl stop --all" to stop previous experiments.

experimentName: MNIST           # An optional name to help you distinguish experiments.

# Hyper-parameter search space can either be configured here or in a seperate file.
# "config.yml" shows how to specify a seperate search space file.
# The common schema of search space is documented here:
#   https://nni.readthedocs.io/en/stable/Tutorial/SearchSpaceSpec.html

searchSpace:
  batch_size:
    _type: choice
    _value: [16, 32, 64, 128]
  hidden_size:
    _type: choice
    _value: [128, 256, 512, 1024]
  lr:
    _type: choice
    _value: [0.0001, 0.001, 0.01, 0.1]
  momentum:
    _type: uniform
    _value: [0, 1]

trialCommand: python3 mnist.py  
# The command to launch a trial. NOTE: change "python3" to "python" if you are using Windows.

trialCodeDirectory: .             
# The path of trial code. 
# By default it's ".", which means the same directory of this config file.

trialGpuNumber: 1               
# How many GPUs should each trial use. CUDA is required when it's greater than zero.

trialConcurrency: 4            # Run 4 trials concurrently.
maxTrialNumber: 10             # Generate at most 10 trials.
maxExperimentDuration: 1h      # Stop generating trials after 1 hour.

# Configure the tuning algorithm.
tuner:                      
  name: TPE            
  # Supported algorithms: TPE, Random, Anneal, Evolution, GridSearch, GPTuner, PBTTuner, etc.
  # Full list:  https://nni.readthedocs.io/en/latest/Tuner/BuiltinTuner.html
  classArgs:                # Algorithm specific arguments. See the tuner's doc for details.
  optimize_mode: maximize   #   "minimize" or "maximize"

# Configure the training platform.
# Supported platforms: local, remote, openpai, aml, kubeflow, kubernetes, adl.
trainingService:
  platform: local
  useActiveGpu: false           
  # NOTE: Use "true" if you are using an OS with graphical interface 
  # (e.g. Windows 10, Ubuntu desktop)
  # Reason and details:
  # https://nni.readthedocs.io/en/latest/reference/experiment_config.html#useactivegpu

模型程式的部分(mnist.py)，請見下方說明。只需要注意：
• 主程式 line159，nni開頭的碼。
• def main(args) 函數中，有兩行 nni開頭的碼(line 118 and 123)，這是和NNI溝通的碼。
所以十分方便簡潔。

另外，也可留意一下模型本身之參數的定義、和外部參數的合併、參數的叫用等。(下面鏈結內的程式，往後會提到。看不懂可忽略。)

nni/mnist.py at master · microsoft/nni · GitHub

說了好幾天的概念，再不動手真的會睡著。下個章節將動手在本機安裝NNI。